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Abstract 



Two questions were investigated: fl) Does a general kind of validation 

experience improve the accuracy of clinical judgments? (2) Do clinical 
judges know when to use their heads instead of the formula? These questions 
were studied using judges known to predict educational criteria at rela- 
tively hi^, moderate, and low levels of accuracy. The results revealed 
that the accuracy of predictions of freshman and overall college grades 
did not improve after tlie validation experience; in fact, some evidence 
showed a decrease in accuracy. Further, the judges were clearly unable to 
improve predictive accuracy by attenpting to recognize when to deviate 



from the formula. 



Do Counselors Know When to Use Their 
Heads Instead of the Formula?^ 

Donivan J. Watley 

Wfemy questions remain unanswered in determining the relative efficiency 
of clinical and statistical methods of prediction. Answers were sou^t in 
this study to two questions specifically concerned with the predictive skill 
of clinical judges. The first relates to the argument of Holt (1958) and 
Gough (1962) that competitive clinical versus statistical prediction studies 
have not provided clinical judges with the same initial validation experi- 
ences available to the statistical method. That is, the statistical method 
is first developed on the same kind of sample and against the same criterion 
that is used in the ccmrparative studies of the two predictive methods. Yet, 
the clinical judge typically is required to make predictions without having 
had any planned validation experience with the criterion prior to the com- 
petitive run. The present study provided clinical judges with one kind of 
prediction experience to deteimiine whether this had any noticeable effect 
upon the accuracy of their forecasts. 

The second question concerns Meehl's (1957) inquiry: When shall we use 

our heads instead of the formula? His analysis of a sizeable number of ccan- 
parative clinical and statistical prediction studies led him to conclude that 
forecasts of outcome or institutional type criteria (e.g., college grades) 
will be more accurate in the long rUn when they are based on the actuarial 
method. Only in unusual circumstances should the clinical judge use his 

^ The data used in this study were collected while the author was on the 
staff of the Student Counseling Bureau, University of Minnesota, Minneapolis. 
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head (rely on his clinical “skills") rather than use the formula. Meehl 
suggests that the clinical judge use his head only when "... the psycho- 
logical situation is as clear as a broken leg; otherwise, very very seldom" 
(l957j P- 273). But the important question remains: Does the clAnical 

judge know when to deviate from the formula, i.e., recognize the "broken 
leg?" 

Whether the clinical judge knows when to deviate from the formula is a 
question of considerable practical importance that surprisingly has received 
virtually no research attention. Since in the actual prediction situation 
the judge usually has the statistically derived prediction, if one is avail- 
able, in addition to other case data, what really matters is whether the 
judge is able to use all of this information efficiently. The typical 
clinical versus statistical prediction study is designed unrealistically 
because the actuarial prediction itself is withheld from the clinical judge. 

Method 

Clinical Judges and the Validation Experience 

Eighteen counselors took part in this study, all of whom pa:, uicipated 
in a previous investigation (Watley, 1966b) that assessed the predictive 
skill of individual counselors. A total of 66 high school and college 
counselors were in the first study and the 18 included in this, study were 
specifically selected on the basis of their ability to predict: (l) fresh- 

man grades, (2) overall college grades, and (3) whether students would per- 
sist and be successful in the educational programs they selected at the time 
of admission to college. 

Based on prediction records, the counselors were ranked from 1 to 66 
on each of the three criteria. The two ranks for freshman and overall 



grades were then combined, leaving one set of ranks for accuracy in fore- 
casting grades and the other for judging persistence and graduation from 
initial educational programs. Counselors were identified who ranked in the 
top one-third* (including ranks 1 to 22), in the middle one-third (ranks 23- 
44), and in the bottom one-third (ranks 45-66) on each of the two sets of 
rankings. Of the counselors identified at each level, six were randomly 
selected to participate in this study; and they were labeled respectively 
the hi^, moderate, and low accuracy groups. Use of these three groups made 
it possible to examine whether the validation experience was differentially 
related to the ability to predict accurately. 

Prediction experience was acquired, therefore, in the first study. Pre- 
dictions were made for the same sa2r5)le of 100 cases in each of three con- 
ditions that differed in the type and amount of case information available. 
However, the judges were unaware that the same cases were included in each 
condition. The exact data provided in each condition can be found by re- 
ferring to the initial study (Wat ley, 1966b). 

The present study was conducted approximately one year after the first 
investigation. The following procedure was used to provide judges with 
further information about the prediction task. Approximately two months 
prior to this study each judge was given a r^ort of the results obtained 
in the initial investigation (Watley & Vance, 1964). This report included 
information (listed by counselor identification number) about the number of 
correct predictions each judge made for each condition and the correlation 
coefficient between each judge *s predictions and the grades actually obtained 
by students. In addition, specific data were provided about the case vari- 
ables most hi^ly related to the predicted criteria, as well as the differ- 
ences in data typically used by judges who predict at relatively hi^. 
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moderate, and lew levels of accuracy. Other information included; the re- 
lationship between counselor confidence in the.ir judgments and actual pre- 
dictive accuracy; the effect of place of employment (hi^ school or college) 
on counselor predictive accuracy; the reliability of counselor judgments; 
and psychometric and biographic differences between counselors who predict 
educational criteria most or least accurately. About two days before mak-ing 
judgments in this study the judges were contacted and asked to review this 
material. The investigator then talked individually with each judge 
-two things were discussed: (l) the judge's performance in the first study 

and (2) information contained in the report that mi gh t generally be useful 
to improve predictive accuracy of grades. However, this was designed as a 
self-learning process in which information was provided but the judge was 
left to integrate it for himself. 

The clinical judges predicted both freshman and overall college grades 
in this study. The effect of the validation experience was determined by 
conparing the number of correct predictions made in the initial study with 
the number made in the present study. A hit was defined as a correct 
dichotcMized prediction for a student to earn a grade average of ”C or 
hi^er" or "less than C," based on grades actually earned. 

Deviation from the Formula 

The judges were asked first to make freshman and overall college grade 
predictions for 50 cases. As indicated, this set of predictions was com- 
pared with predictions made in the earlier study (Watley, 1966b) to assess 
the effect of the validation experience. This set of predictions was also 
used to determine whether judges recognized when to deviate from the formula. 
After forecasts were made for all cases, the judges were then asked to go 
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iDack throu^ each case folder again; only this time the statistical pre- 
dictions for fres hman and overall college grades were also available. The 
judge *s job was to decide whether he should deviate ft*om the statistical 
prediction in order to iii 5 )rove predictive accuracy. He was also aware of 
his first predictions for each case when the statistical predictions were 
not available. 

Whether the judge recognized -when to deviate from the formula was as- 
sessed ia two ways: (l) the accuracy of his forecasts with and without the 

availability of the statistical predictions, and (2) the accuracy of his 
forecasts in comparison with the accuracy of statistical predictions. 

The statistical predictions were cross -validated and were based on an 
equation that included high school rank (HSR), the Minnesota Scholastic 
Aptitude Test (MSAT) and the Cooperative English Test (CET). 

Prediction Sample and Case Data 

* The saii 5 >le was ccHnposed of 50 males who entered the College of Science, 
Literature, and the Arts (SIA) at the University of Minnesota as first- 
quarter freshmen in the fall of 1959* These students were randomly se- 
lected from among the entire entering class of freshman males. However, 
inclusion depended on the availability of all of the desired psychometric 
and biographic case data, graduation from a Minnesota high school during 
the spring of 1959? a-nd at least one quarter spent in SLA. 

Each case folder contained information related to scholastic aptitude 
and past academic achievement. Test scores were provided for the MSAT, the 
CET, and the So.cial Studies Test of the Sequential Tests of Educational 
Progress. Achievement data included each student's HSR and the last high 
school grades earned in the areas of mathematics, English, social studies. 
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and natural sciences. Also included were results for the Strong Vocational 
Interest Blank ana the Minnesota Multiphasic Personality Inventory, plus con- 
siderable biographic information given on the Minnesota College Admissions 
Form and the Personal Inventory for Entering Students. 

Statistical data were also provided to each judge for use in making 
predictions. This included: freshman grade expectancy tables for HSR, 

MSAT, and the GET; and a regression equation that included prediction coef- 
ficients for the hi^ school grades of mathematics, English, social studies, 
and natural sciences. 

The type and amount of case information provided in these folders corre- 
sponded to the third condition under which judgments were made in the in- 

A 

itial study (Watley, 1966b). Essentially, these folders contained all of 
the data that were available for this group of students before they entered 
college. Therefore, the number of correct predictions in this study were 
con5)ared with the number of hits made by judges in the third condition of 
the first investigation. However, since judgments were made for 100 cases 
in the first study and 50 in this one, the total number of correct fore- 
casts obtained by each judge in the first study was divided by two in order 
to make the number of cases comparable for the two investigations. 

\ 

Results and Discussion 

Does Validation Experience Effect the Accuracy of Clinical Judgments ? 

Table 1 shows the mean number of correct forecasts made by the hi^, 
moderate, and low accuracy groups of judges both before and after the vali- 
dation e:q)erience. An analysis of variance was computed separately for each 
predicted criterion. 

The main concern of these analyses was whether significantly more hits 




Table 1 



Mean Number of Hits Obtained "by Judges Before 
and After the Validation Experience 









Level 


of Predictive 


Skill 




Validation 

Experience 




High 


Moderate 


Low 




First 

year 


0-A 


First 

year 


0-A 


First 

-year 


0-A 


Before 


Mh 

SD 


36.1 

1.5 


32.7 

1.6 


34.7 

2.5 


30.0 

1.0 


31.5 

3.9 


27.9 

2.1 


After 


Mn 

SD 


36.8 

1.6 


30.5 

2.0 


32.5 

5.8 


27.2 

2.6 


29.0 

5.8 


27.8 

2.8 



were obtained by the judges after the validatj.on experience. The F found 
for assessing this difference for freshman grades was not significant at 
the .05 level. Table 1 shows that the most accurate judges obtained about 
the same mean number of hits after the validation experience, while the 
moderate and least accurate judges made sli^tly fewer hits. Thus, no 
evidence was obtained that the previous prediction experience and the feed- 
back information the judges received aided in producing more accurate judg- 
ments. 

As expected, however, the F of 13.17 obtained for assessing the differ- 
ences among the means .for the high, moderate, and low accuracy groups was 
significant beyond the .001 level. The interaction term was not significant 
at the .05 level. 

For the overall college grade judgments, the obtained F of 5.19 for 
assessing the effect of the validation e:q)erience was significant at the 
.05 level. Surprisingly, however, opposite results occurred than might 
have been anticipated. Rather than inproving accuracy. Table 1 shows that 
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the high and moderate level judges predicted less accurately after the vali- 
dation experience. 

The F of 18.70 for assessing the mean differences among judges -who pre- 
dict at high, moderate, and low levels of accuracy was significant heyond 
the .01 level. This was expected. The interaction term was not signifi- 
cant at the .05 level (F=2.48). 

Obviously, the kind of validation experience provided judges in this 
study did not help iirprove their predictive ability. What this apparently 
means is that familiarity with general information that could be useful in 
improving predictive ability is not sufficient. Both Soskin (1954) and Crow 
(1957) found similar results to the extent that accuracy failed to improve 
under conditions that were not well defined. As was found here. Crow's 
judges were somewhat less accurate in interpersonal perception after train- 
ing, a loss that seemed related to a decreased sensitivity to individual 
differences. In this study it is likely that some of the judges were unable 
to effectively integrate this new information, became somewhat confused, 
and predicted overall college grades less accurately than they would have 
without these data to synthesize. 

Perhaps in addition to general information, a systemized form of im- 
mediate feedback after specific predictions would be more successful in 
building internal norms and, thus, help to improve the accuracy of clinical 
judgments of this type. Taft (1955) previously suggested this possibility 
and Oskamp's (1962) research demonstrated some success with this approach. 
However, the question then becomes: to what extent should one go in order 

to train clinical judges to predict institutional- type criteria as accurate- 
ly as the equation can do already? Theoretically, specific training would 
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“be necessary for every specific criterion. Perhaps the clinical judge's 
time would he better spent analyzing and impro ving his predictions of cri- 
teria for which the statistical method is not applicable. 

Does the Judge Recognize When to Deviate frcm the Formula ? 

The first analysis was a cc»5)arison of the accuracy of judgments made 
with and without the availability of statistical predictions. The latter 
judgments were made with instructions to decide when to deviate from the 
formula, i.e., recognize the "broken leg" cases. The mean number of hits 
obtained by the judges under these two conditions are shown in Table 2. 

Table 2 

Mean Number of Hits Obtained by Judges in 
"Deviating from the Formula" 



Availability 

of 

Statistical 

Predictions 




Level 


of Predictive Skill 




High 


Moderate 


Low 


First 

year 


0-A 


First 

year 


0-A 


First 

year 


0-A 


Without ^ 


36.8 


30.5 


32.5 


27.2 


29.0 


27.8 


SD 


1.8 


2.0 


5.8 


2.6 


5.8 


2.8 


With ^ 


36.0 


30.3 


32.5 


28.3 


29.8 


28.3 


SD 


1.4 


1.4 


5.U 


2.1 


3.9 


3.5 



For freshman grades, the F for assessing the correct predictions made 
by judges under the two conditions was not significant. In fact, the total 
mean number of hits (32.8) for the three groups of judges was identical for 
both conditions. Thus, not only were the judges unable to effectively de- 
cide when to deviate from the formula, the statistical predictions had rela- 
tively little effect in any direction on the accuracy of their forecasts. 

The F of 13.57 for assessing the differences among the three accuracy groups 
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was significant beyond the .01 level; and the interaction term was not sig- 
nificant at the .05 level. 

The results found for the overall college grade predictions were es- 
sentially the same. The F for assessing the differences with and without 
the statistical predictions Available was not significant at the .05 level; 
but the F of 7.27 for the three accuracy groups was significant at the .05 
level. Also noteworthy with this prediction is the fact that no differences 
were observed between the "moderate" and "low" level judges in the nuiriber of 
hits made. However, three criteria were used in the initial selection of 
the three accuracy groiq>s and there was little variation among the judges 
in their ability to predict overall college grades. 

The second analysis compared the number of hits made by judges idien 
they attenpted to recognize the "broken leg" cases with the number of cor- 
rect predictions made by the actuarial method. The equation that included 
HSR, MSAT, and the GET correctly predicted "C or better" or "less than C" 
freshman grades for 35 cases and overall college grades for 31 cases. Table 
2 shows that the most accurate judges were able to make forecasts of both 
criteria about as accurately as the statistical method. An analysis of 
their individual judgments showed that they tended to remain rather closely 
in agreement with the statistical predictions. 

Judges who predicted at the moderate and lowest levels were inclined to 
deviate more frequently from the statistical predictions, preferring to re- 
main in agreement with their initial judgments made without the statistical 
forecasts. As Table 2 shows, the availability of the statistical predictions 
had no noticeable effect on the accuracy of their judgments. Although demon- 
strating confidence in their predictions, this also reveals that the poorer 
judges failed to learn from the information provided to them earlier. For 
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example, they did not learn that judges who predict educational criteria 
least accurately tend to express more confidence in their forecasts than 
judges who predict most accurately (Watley, 1966a); or that they were more 
likely to inprove predictive accuracy of institutional-type criteria hy 
stic kin g rather closely to the statistically derived forecasts. 

Thus the results obtained were disappointing. The judges who previ- 
ously demonstrated the hipest level of predictive ability were unable to 
inprove on the accuracy of the statistical method by recognizing "broken 
leg" cases in which the statistical forecast was likely to be in error. 
However, the best judges tended to approach this task cautiously, unwilling 
to trust their judgment to select likely "deviate" cases. More alarming, 
however, is the fact that counselors in the moderate and lew level groups 
stubbornly persisted in believing in the correctness of their own judgments 
in spite of rather powerful evidence to the contrary. In the final analy- 
sis, Meehl's warning is as appropriate as before except that in making fore- 
casts of institutional criteria it seems that the judge should deviate 
the formula "very, very, very seldom," 



from 
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